Intra-Option Learning about Temporally Abstract Actions

نویسندگان

Richard S. Sutton

Doina Precup

Satinder P. Singh

چکیده

Several researchers have proposed modeling temporally abstract actions in reinforcement learning by the combination of a policy and a termination condition, which we refer to as an ”option”. Value functions over options and models of options can be learned using methods designed for semi-Markov decision processes (SMDPs). However, these methods all require an option to be executed to termination. In this paper we explore methods that learn about an option from small fragments of experience consistent with that option, even if the option itself is not executed. We call these methods ”intraoption” learning methods because they learn from experience within an option. Intra-option methods are sometimes much more efficient than SMDP methods because they can use off-policy temporal-difference mechanisms to learn simultaneously about all the options consistent with an experience, not just the few that were actually executed. In this paper we present intra-option learning methods for learning value functions over options and for learning multi-step models of the consequences of options. We present computational examples in which these new methods learn much faster than SMDP methods and learn effectively when SMDP methods cannot learn at all. We also sketch a convergence proof for intra-option value learning.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unified Inter and Intra Options Learning Using Policy Gradient Methods

Temporally extended actions (or macro-actions) have proven useful for speeding up planning and learning, adding robustness, and building prior knowledge into AI systems. The options framework, as introduced in Sutton, Precup and Singh (1999), provides a natural way to incorporate macro-actions into reinforcement learning. In the subgoals approach, learning is divided into two phases, first lear...

متن کامل

GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

A new family of gradient temporal-difference learning algorithms have recently been introduced by Sutton, Maei and others in which function approximation is much more straightforward. In this paper, we introduce the GQ(λ) algorithm which can be seen as extension of that work to a more general setting including eligibility traces and off-policy learning of temporally abstract predictions. These ...

متن کامل

Temporal Abstraction in TD Networks

Temporal-difference (TD) networks have been proposed as a way of representing and learning a wide variety of predictions about the interaction between an agent and its environment (Sutton & Tanner, 2005). These predictions are compositional in that their targets are defined in terms of other predictions, and subjunctive in that that they are about what would happen if an action or sequence of a...

متن کامل

Temporal Abstraction in TD Networks

متن کامل

Automated State Abstraction for Options using the U-Tree Algorithm

Learning a complex task can be significantly facilitated by defining a hierarchy of subtasks. An agent can learn to choose between various temporally abstract actions, each solving an assigned subtask, to accomplish the overall task. In this paper, we study hierarchical learning using the framework of options. We argue that to take full advantage of hierarchical structure, one should perform op...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1998

Intra-Option Learning about Temporally Abstract Actions

نویسندگان

چکیده

منابع مشابه

Unified Inter and Intra Options Learning Using Policy Gradient Methods

GQ(λ): A general gradient algorithm for temporal-difference prediction learning with eligibility traces

Temporal Abstraction in TD Networks

Temporal Abstraction in TD Networks

Automated State Abstraction for Options using the U-Tree Algorithm

عنوان ژورنال:

اشتراک گذاری